Conference Proceedings
Towards Q-learning the whittle index for restless bandits
J Fu, Y Nazarathy, S Moka, PG Taylor
2019 Australian and New Zealand Control Conference Anzcc 2019 | IEEE | Published : 2019
Abstract
We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost objective. Each arm of the RMABP is associated with a Markov process that operates in two modes: active and passive. At each time slot a controller needs to designate a subset of the arms to be active, of which the associated processes will evolve differently from the passive case. Treated as an optimal control problem, the optimal solution of the RMABP is known to be computationally intractable. In many cases, the Whittle index policy achieves near optimal performance and can be tractably found. Nevertheless, computation of the Whittle indices requires knowledge of the transition matrices of th..
View full abstractGrants
Awarded by Australian Research Council
Funding Acknowledgements
J. Fu and P.G. Taylor's research is supported by the Australian Research Council (ARC) Laureate Fellowship FL130100039 and the ARC Centre of Excellence for the Mathematical and Statistical Frontiers (ACEMS). S. Moka's research is supported by ACEMS, under grant number CE140100049. Y. Nazarathy's research is supported by ARC grant DP180101602. The authors also thank Prof. Vivek Borkar for preliminary discussions.